NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Domain-adaptive neural networks improve cross-species prediction of transcription factor binding

https://doi.org/10.1101/gr.275394.121

Cochran, Kelly; Srivastava, Divyanshi; Shrikumar, Avanti; Balsubramani, Akshay; Hardison, Ross C.; Kundaje, Anshul; Mahony, Shaun (March 2022, Genome Research)

The intrinsic DNA sequence preferences and cell type–specific cooperative partners of transcription factors (TFs) are typically highly conserved. Hence, despite the rapid evolutionary turnover of individual TF binding sites, predictive sequence models of cell type–specific genomic occupancy of a TF in one species should generalize to closely matched cell types in a related species. To assess the viability of cross-species TF binding prediction, we train neural networks to discriminate ChIP-seq peak locations from genomic background and evaluate their performance within and across species. Cross-species predictive performance is consistently worse than within-species performance, which we show is caused in part by species-specific repeats. To account for this domain shift, we use an augmented network architecture to automatically discourage learning of training species–specific sequence features. This domain adaptation approach corrects for prediction errors on species-specific repeats and improves overall cross-species model performance. Our results show that cross-species TF binding prediction is feasible when models account for domain shifts driven by species-specific repeats.
more » « less
Full Text Available
WILDS: A Benchmark of in-the-Wild Distribution Shifts

Koh, Pang Wei; Sagawa, Shiori; Marklund, Henrik; Xie, Sang Michael; Zhang, Marvin; Balsubramani, Akshay; Hu, Weihua; Yasunaga, Michihiro; Phillips, Richard Lanas; Gao, Irena; et al (January 2021, Proceedings of Machine Learning Research)
null (Ed.)
Distribution shifts—where the training distribution differs from the test distribution—can substantially degrade the accuracy of machine learning (ML) systems deployed in the wild. Despite their ubiquity in the real-world deployments, these distribution shifts are under-represented in the datasets widely used in the ML community today. To address this gap, we present WILDS, a curated benchmark of 10 datasets reflecting a diverse range of distribution shifts that naturally arise in real-world applications, such as shifts across hospitals for tumor identification; across camera traps for wildlife monitoring; and across time and location in satellite imaging and poverty mapping. On each dataset, we show that standard training yields substantially lower out-of-distribution than in-distribution performance. This gap remains even with models trained by existing methods for tackling distribution shifts, underscoring the need for new methods for training models that are more robust to the types of distribution shifts that arise in practice. To facilitate method development, we provide an open source package that automates dataset loading, contains default model architectures and hyperparameters, and standardizes evaluations. The full paper, code, and leaderboards are available at https://wilds.stanford.edu.
more » « less
Full Text Available
The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution

https://doi.org/10.1016/j.cell.2020.03.053

Rozenblatt-Rosen, Orit; Regev, Aviv; Oberdoerffer, Philipp; Nawy, Tal; Hupalowska, Anna; Rood, Jennifer E.; Ashenberg, Orr; Cerami, Ethan; Coffey, Robert J.; Demir, Emek; et al (April 2020, Cell)

Full Text Available

Search for: All records